IRCAM Corpus Tools: Managing speech corpora

نویسندگان

  • Grégory Beller
  • Christophe Veaux
  • Gilles Degottex
  • Nicolas Obin
  • Pierre Lanchantin
  • Xavier Rodet
چکیده

Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic analysis, storage and query. However, if a variety of tools exists for each of these individual tasks, they are rarely integrated into a single platform made available to a large community of researchers. In this paper, we propose IrcamCorpusTools, an open and easily extensible platform for analysis, query and visualization of speech corpora. It is already used for unit selection speech synthesis, for prosody and expressivity studies, and to exploit various corpora of spoken French or other languages. MOTS-CLÉS : parole, corpus, base de données, langage de requête, multimodalité.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Praaline: Integrating Tools for Speech Corpus Research

This paper presents Praaline, an open-source software system for managing, annotating, analysing and visualising speech corpora. Researchers working with speech corpora are often faced with multiple tools and formats, and they need to work with ever-increasing amounts of data in a collaborative way. Praaline integrates and extends existing time-proven tools for spoken corpora analysis (Praat, S...

متن کامل

For Standardised Amazigh Linguistic Resources

Amazigh language and culture may well be viewed to have known an unprecedented booming in Morocco : more than a hundredwhich are published by the Royal Institute of Amazigh Culture (IRCAM), an institution created in 2001 to preserve, promote and endorse Amazigh culture in all its dimensions. Crucially, publications in the Amazigh language would not have seen light without the valiant attempts t...

متن کامل

Querying Annotated Speech Corpora

This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus crea...

متن کامل

Festvox: Tools for Creation and Analyses of Large Speech Corpora

This paper summarises the tools provided within Festvox[1], a freely available software suite for creation and analyses of large scale speech corpora for enabling research, development and instruction in speech technologies.

متن کامل

Concordancing for parallel spoken language corpora

Concordancing is one of the oldest corpus analysis tools, especially for written corpora. In NLP concordancing appears in training of speech-recognition system. Additionally, comparative studies of different languages result in parallel corpora. Concordancing for these corpora in a NLP context is a new approach. We propose to combine these fields of interest for a multi-purpose concordance for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • TAL

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2008